111 research outputs found

    Big tranSMART for clinical decision making

    Get PDF
    Molecular profiling data based patient stratification plays a key role in clinical decision making, such as identification of disease subgroups and prediction of treatment responses of individual subjects. Many existing knowledge management systems like tranSMART enable scientists to do such analysis. But in the big data era, molecular profiling data size increases sharply due to new biological techniques, such as next generation sequencing. None of the existing storage systems work well while considering the three ”V” features of big data (Volume, Variety, and Velocity). New Key Value data stores like Apache HBase and Google Bigtable can provide high speed queries by the Key. These databases can be modeled as Distributed Ordered Table (DOT), which horizontally partitions a table into regions and distributes regions to region servers by the Key. However, none of existing data models work well for DOT. A Collaborative Genomic Data Model (CGDM) has been designed to solve all these is- sues. CGDM creates three Collaborative Global Clustering Index Tables to improve the data query velocity. Microarray implementation of CGDM on HBase performed up to 246, 7 and 20 times faster than the relational data model on HBase, MySQL Cluster and MongoDB. Single nucleotide polymorphism implementation of CGDM on HBase outperformed the relational model on HBase and MySQL Cluster by up to 351 and 9 times. Raw sequence implementation of CGDM on HBase gains up to 440-fold and 22-fold speedup, compared to the sequence alignment map format implemented in HBase and a binary alignment map server. The integration into tranSMART shows up to 7-fold speedup in the data export function. In addition, a popular hierarchical clustering algorithm in tranSMART has been used as an application to indicate how CGDM can influence the velocity of the algorithm. The optimized method using CGDM performs more than 7 times faster than the same method using the relational model implemented in MySQL Cluster.Open Acces

    Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models

    Full text link
    We do not pursue a novel method in this paper, but aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories. Existing domain adaptive image classification works exploit both source and target data for domain alignment so as to transfer the knowledge learned from the labeled source data to the unlabeled target data. However, as the development of the text-to-image diffusion model, we wonder if the high-fidelity synthetic data from the text-to-image generator can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each domain adaptation task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts, and then leverage the surrogate data as a bridge to transfer the knowledge embedded in the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data. Extensive experiments validate the feasibility of the proposed idea, which even surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.Comment: 11 pages, 6 figure

    Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

    Get PDF
    Funding: UK EPSRC grants ”Discovery” EP/P020631/1, ”ABC: Adaptive Brokerage for the Cloud” EP/R010528/1.This article presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.PostprintPeer reviewe

    Attention Diversification for Domain Generalization

    Full text link
    Convolutional neural networks (CNNs) have demonstrated gratifying results at learning discriminative features. However, when applied to unseen domains, state-of-the-art models are usually prone to errors due to domain shift. After investigating this issue from the perspective of shortcut learning, we find the devils lie in the fact that models trained on different domains merely bias to different domain-specific features yet overlook diverse task-related features. Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features. Briefly, Intra-Model Attention Diversification Regularization is equipped on the high-level feature maps to achieve in-channel discrimination and cross-channel diversification via forcing different channels to pay their most salient attention to different spatial locations. Besides, Inter-Model Attention Diversification Regularization is proposed to further provide task-related attention diversification and domain-related attention suppression, which is a paradigm of "simulate, divide and assemble": simulate domain shift via exploiting multiple domain-specific models, divide attention maps into task-related and domain-related groups, and assemble them within each group respectively to execute regularization. Extensive experiments and analyses are conducted on various benchmarks to demonstrate that our method achieves state-of-the-art performance over other competing methods. Code is available at https://github.com/hikvision-research/DomainGeneralization.Comment: ECCV 2022. Code available at https://github.com/hikvision-research/DomainGeneralizatio

    Tribological properties and wear mechanisms of DC pulse plasma nitrided austenitic stainless steel in dry reciprocating sliding tests

    Get PDF
    Expanded austenite (Gamma-N), or S-phase, is a special phase of low-temperature nitrided austenite containing highly super-saturated nitrogen in the form of heterogeneous Cr-N nano-clusters. A nitrided layer of singe phase N is known to provide austenitic stainless steel with combined high hardness, good wear resistance and superior corrosion resistance. This paper reports recent experiments on a comparative study of the sliding wear properties and wear mechanisms of nitrided austenite stainless steel AISI 316, with a special attention paid on worn surface structural evolutions induced by frictional heating and sliding deformation. The samples were prepared by DC pulsed plasma nitriding treatments of various time at a fixed power. Knoop micro-indentation has revealed hardening behaviour of the nitrided samples. The reciprocating ball-on-disc sliding wear and friction properties were investigated at ambient environment conditions using an alumina counterpart ball. The worn surfaces have been analysed by XRD,FEG-SEM and EDX to show wear induced changes in the crystalline characteristics and the wear mechanisms of tribo-oxidation, cracking, abrasive wear and ploughing deformation. Moreover, longitudinal cross-sectional foils of the worn samples have been prepared and analysed using TEM, to investigate the wear induced structural changes, including tribofilm formation, plastic deformation and delamination in depths of nano-scale

    Laryngeal Reinnervation Using Ansa Cervicalis for Thyroid Surgery-Related Unilateral Vocal Fold Paralysis: A Long-Term Outcome Analysis of 237 Cases

    Get PDF
    To evaluate the long-term efficacy of delayed laryngeal reinnervation using the main branch of the ansa cervicalis in treatment of unilateral vocal fold paralysis (UVFP) caused by thyroid surgery.UVFP remains a serious complication of thyroid surgery. Up to now, a completely satisfactory surgical treatment of UVFP has been elusive.From Jan. 1996 to Jan. 2008, a total of 237 UVFP patients who underwent ansa cervicalis main branch-to-recurrent laryngeal nerve (RLN) anastomosis were enrolled as UVFP group; another 237 age- and gender-matched normal subjects served as control group. Videostroboscopy, vocal function assessment (acoustic analysis, perceptual evaluation and maximum phonation time), and electromyography were performed preoperatively and postoperatively. The mean follow-up period was 5.2±2.7 years, ranging from 2 to 12 years.>0.05, respectively). Postoperative laryngeal electromyography confirmed successful reinnervation of laryngeal muscle.Delayed laryngeal reinnervation with the main branch of ansa cervicalis is a feasible and effective approach for treatment of thyroid surgery-related UVFP; it can restore the physiological laryngeal phonatory function to the normal or a nearly normal voice quality

    The effect of precursor concentration on the particle size, crystal size, and optical energy gap of CexSn1â’xO2 nanofabrication

    Get PDF
    In the present work, a thermal treatment technique is applied for the synthesis of CexSn1−xO2 nanoparticles. Using this method has developed understanding of how lower and higher precursor values affect the morphology, structure, and optical properties of CexSn1−xO2 nanoparticles. CexSn1−xO2 nanoparticle synthesis involves a reaction between cerium and tin sources, namely, cerium nitrate hexahydrate and tin (II) chloride dihydrate, respectively, and the capping agent, polyvinylpyrrolidone (PVP). The findings indicate that lower x values yield smaller particle size with a higher energy band gap, while higher x values yield a larger particle size with a smaller energy band gap. Thus, products with lower x values may be suitable for antibacterial activity applications as smaller particles can diffuse through the cell wall faster, while products with higher x values may be suitable for solar cell energy applications as more electrons can be generated at larger particle sizes. The synthesized samples were profiled via a number of methods, such as scanning electron microscopy (SEM), transmission electron microscopy (TEM), X-ray diffraction (XRD), and Fourier transform infrared spectroscopy (FT-IR). As revealed by the XRD pattern analysis, the CexSn1−xO2 nanoparticles formed after calcination reflect the cubic fluorite structure and cassiterite-type tetragonal structure of CexSn1−xO2 nanoparticles. Meanwhile, using FT-IR analysis, Ce-O and Sn-O were confirmed as the primary bonds of ready CexSn1−xO2 nanoparticle samples, whilst TEM analysis highlighted that the average particle size was in the range 6−21 nm as the precursor concentration (Ce(NO3)3·6H2O) increased from 0.00 to 1.00. Moreover, the diffuse UV-visible reflectance spectra used to determine the optical band gap based on the Kubelka–Munk equation showed that an increase in x value has caused a decrease in the energy band gap and vice versa

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
    corecore